The Forgotten League

Jaiden Brown

04/26/2023

Data

https://github.com/fivethirtyeight/negro-leagues-player-ratings

The github repository with the dataset, this analyses will explain the story and stats of many forgotten baseball stars.

Barrier of entry:

Negro league: 150 games as a batter or 60 games + starts as a pitcher

MLB: 300 games as a batter or 350 games + starts as a pitcher

The goal of our analysis

Libraries

library(tidyverse)
library(dplyr)
library(ggplot2)
library(plotly)

#install.packages("plotly")

Our Table

library(readr)
RawNLBandMLB <- read_csv("negro-leagues-player-ratings.csv")

glimpse(RawNLBandMLB)
## Rows: 1,117
## Columns: 25
## $ playerID     <chr> "culbech01", "gosseph01", "herrmch01", "kratzer01", "pire…
## $ commonName   <chr> "Charlie Culberson", "Phil Gosselin", "Chris Herrmann", "…
## $ league       <chr> "MLB", "MLB", "MLB", "MLB", "MLB", "MLB", "MLB", "MLB", "…
## $ hof          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ startYear    <dbl> 2012, 2013, 2012, 2010, 2014, 2015, 2011, 2014, 2008, 201…
## $ endYear      <dbl> 2020, 2020, 2019, 2020, 2019, 2019, 2019, 2019, 2019, 202…
## $ totalGames   <dbl> 428, 359, 370, 335, 302, 326, 461, 419, 386, 313, 376, 48…
## $ positionWar  <dbl> -0.620, 0.895, -1.150, 1.715, 0.545, 1.310, -1.555, 4.340…
## $ averageHit   <dbl> 41.791451, 72.992105, 3.648244, 21.236047, 67.574190, 10.…
## $ patience     <dbl> 13.776205, 28.641438, 70.106180, 19.112442, 18.976314, 24…
## $ power        <dbl> 41.709774, 16.879935, 44.105636, 69.670569, 37.244759, 9.…
## $ speed        <dbl> 64.524912, 58.562483, 75.850803, 1.334059, 78.872856, 81.…
## $ defense      <dbl> 24.25810, 44.89518, 36.48244, 99.59161, 38.95998, 90.4982…
## $ gameCutoff   <dbl> 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 30…
## $ playerLabel  <chr> "Active Player", "Active Player", "Active Player", "Activ…
## $ shortWar     <dbl> -0.2346729, 0.4038719, -0.5035135, 0.8293433, 0.2923510, …
## $ positionCat  <chr> "Outfielder", "Middle IF", "Catcher", "Catcher", "Middle …
## $ position     <chr> "Batter", "Batter", "Batter", "Batter", "Batter", "Batter…
## $ careerStarts <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ strikeOuts   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ control      <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ fip          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ whip         <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ era          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ fact         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…

Variables

Filtering

NLBandMLB <- RawNLBandMLB %>% select(playerID, commonName, league, hof, startYear, endYear, totalGames, positionWar, averageHit, defense, gameCutoff, playerLabel, shortWar, positionCat, position, era)

NLB <- NLBandMLB %>% filter(league == 'NLB')

MLB <- NLBandMLB %>% filter(league == 'MLB')

glimpse(NLBandMLB)
## Rows: 1,117
## Columns: 16
## $ playerID    <chr> "culbech01", "gosseph01", "herrmch01", "kratzer01", "pirel…
## $ commonName  <chr> "Charlie Culberson", "Phil Gosselin", "Chris Herrmann", "E…
## $ league      <chr> "MLB", "MLB", "MLB", "MLB", "MLB", "MLB", "MLB", "MLB", "M…
## $ hof         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ startYear   <dbl> 2012, 2013, 2012, 2010, 2014, 2015, 2011, 2014, 2008, 2018…
## $ endYear     <dbl> 2020, 2020, 2019, 2020, 2019, 2019, 2019, 2019, 2019, 2020…
## $ totalGames  <dbl> 428, 359, 370, 335, 302, 326, 461, 419, 386, 313, 376, 489…
## $ positionWar <dbl> -0.620, 0.895, -1.150, 1.715, 0.545, 1.310, -1.555, 4.340,…
## $ averageHit  <dbl> 41.791451, 72.992105, 3.648244, 21.236047, 67.574190, 10.8…
## $ defense     <dbl> 24.25810, 44.89518, 36.48244, 99.59161, 38.95998, 90.49823…
## $ gameCutoff  <dbl> 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300…
## $ playerLabel <chr> "Active Player", "Active Player", "Active Player", "Active…
## $ shortWar    <dbl> -0.2346729, 0.4038719, -0.5035135, 0.8293433, 0.2923510, 0…
## $ positionCat <chr> "Outfielder", "Middle IF", "Catcher", "Catcher", "Middle I…
## $ position    <chr> "Batter", "Batter", "Batter", "Batter", "Batter", "Batter"…
## $ era         <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…

Could NLB player even compete in the majors?

duplicatedData <- inner_join(x = NLB, y = MLB, by = "commonName") %>% select("commonName")

PlayersInBothLeagues <- inner_join(NLBandMLB, duplicatedData, by = "commonName")

ggplot(PlayersInBothLeagues, mapping = aes(shortWar, commonName, color = league)) +
  geom_point()

Are the distribution of WAR similar across both leagues?

ggplot(NLBandMLB, mapping = aes(league, shortWar, fill = league)) +
  geom_boxplot()

Conclusion: The talent in both leagues are comparable

Who were the very best in the NLB and how to they compare to the MLB?

Who are the superstars in the NLB?

Where would the NLB batters rank all time?

## # A tibble: 20 × 5
##    commonName       league averageHit shortWar   hof
##    <chr>            <chr>       <dbl>    <dbl> <dbl>
##  1 Ty Cobb          MLB         100       8.02     1
##  2 Charlie Smith    NLB         100      10.3      0
##  3 Nap Lajoie       MLB          99.9     7.09     1
##  4 Ed Delahanty     MLB          99.9     7.97     1
##  5 Ted Williams     MLB          99.9     8.92     1
##  6 Rogers Hornsby   MLB          99.9     9.23     1
##  7 Tris Speaker     MLB          99.8     7.70     1
##  8 Rod Carew        MLB          99.8     5.05     1
##  9 Tony Gwynn       MLB          99.7     4.45     1
## 10 Josh Gibson      NLB          99.7    10.9      1
## 11 George Sisler    MLB          99.7     4.18     1
## 12 Wade Boggs       MLB          99.6     5.96     1
## 13 Honus Wagner     MLB          99.6     8.25     1
## 14 Stan Musial      MLB          99.6     6.83     1
## 15 Harry Heilmann   MLB          99.5     5.31     1
## 16 Roberto Clemente MLB          99.5     5.84     1
## 17 Eddie Collins    MLB          99.4     7.00     1
## 18 Heavy Johnson    NLB          99.4     5.42     0
## 19 Jose Altuve      MLB          99.4     4.50     0
## 20 Babe Ruth        MLB          99.3    11.1      1

Well who was pitching to these batters?

How did their War compare to their counterparts?

Who were these players?

Summary

After looking and comparing the data I believe it was right for the MLB to recognize and add the stats of many of these players to the MLB as they had very similar competition and many that came out of the NLB was able to produce as similar if not higher levels in the MLB then they did while in the NLB